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Abstract 

Collaborative spectrum sensing can significantly improve the detection performance of secondary 
unlicensed users (SUs). However, the performance of collaborative sensing is vulnerable to sensing data 
falsification attacks, where malicious SUs (attackers) submit manipulated sensing reports to mislead 
the fusion center's decision on spectrum occupancy. Moreover, attackers may not follow the fusion 
center's decision regarding their spectrum access. This paper considers a challenging attack scenario 
where multiple rational attackers overhear all honest SUs' sensing reports and cooperatively maximize 
attackers' aggregate spectrum utilization. We show that, without attack-prevention mechanisms, honest 
SUs are unable to transmit over the licensed spectrum, and they may further be penalized by the primary 
user for collisions due to attackers' aggressive transmissions. To prevent such attacks, we propose two 
novel attack-prevention mechanisms with direct and indirect punishments. The key idea is to identify 
collisions to the primary user that should not happen if all SUs follow the fusion center's decision. 
Unlike prior work, the proposed simple mechanisms do not require the fusion center to identify and 
exclude attackers. The direct punishment can effectively prevent all attackers from behaving mahciously. 
The indirect punishment is easier to implement and can prevent attacks when the attackers care enough 
about their long-term reward. 



This work is supported by the General Research Funds (Project Number 412509) established under the University Grant 
Committee of the Hong Kong Special Administrative Region, China. Corresponding author: Jianwei Huang. 



September 7, 201 1 



DRAFT 



1 

I. Introduction 

Cognitive radios enable secondary unlicensed users (SUs) to opportunistically access licensed 
spectrum bands when they are not being used by primary licensed users (PUs), and thus can 
effectively improve spectrum utilization ifTOl . As a key technology for realizing opportunistic 
spectrum access while protecting PU communications, spectrum sensing aims to detect the 
presence or absence of a primary signal with high accuracy. To provide sufficient protection, 
spectrum sensing must be able to detect even a very weak primary signal, e.g., -20 dB for a 
DTV signal in the IEEE 802.22 WRANs ll37l . To meet such stringent requirement, researchers 
have proposed the use of collaborative spectrum sensing to improve detection performance 
by exploiting sensor location diversity ifTTI . lfT6l . [[30l . [l33l . In collaborative spectrum sensing, 
multiple sensorial sense the spectrum individually and then report their sensing results to a central 
node (i.e., fusion center) for a final decision on spectrum occupancy. 

Collaborative sensing, however, is vulnerable to critical attacks, such as sensing data falsifi- 
cation attacks, while its detection is difficult. In CRNs, sensors can be deployed in unattended 
and hostile environments, and thus can be compromised by attackers. Thus, compromised or 
malicious sensors can intentionally send distorted sensing results to the fusion center in order 
to disrupt the incumbent detection process flU, ll23l . ll3Tll . Such attacks can be easily launched 
due to the openness of the low-layer protocols stacks of cognitive radio devices [l44|. However, 
it is challenging for the fusion center to accurately validate the integrity of sensing reports 
because of the two unique features in spectrum sensing — unpredictability in wireless channel 
signal propagations and lack of coordination between PUs and SUs. The sensing data falsification 
attack will ultimately result in a waste of spectrum opportunities (in the form of false alarms), 
and/or excessive interference to the PU communications (in the form of missed detections). 
Therefore, this poses a significant threat to the implementation of cognitive radio technology, 
and thus calls for efficient attack detection and prevention mechanisms. 

In this paper, we consider an attack scenario in which multiple attackers (i.e., compromised 
SUs/sensors) cooperate to maximize their aggregate spectrum utilization in cognitive radio 
networks (CRNs). Despite the serious threat posed by collaborated attacks, attacker collaboration 
have not been fully considered in CRNs. We focus on the particularly challenging attack scenario 

'We use the terms "SU" and "sensor" interchangeably throughout the paper. 
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in which attackers can overhear all honest SUs' sensing reports, whereas the honest SUs are 
unaware of the existence of attackers. This information asymmetry gives the attackers maximum 
capability to launch attacks and achieve their goals. We design attack-prevention mechanisms 
that safeguard collaborative sensing in such a challenging attack scenario, which constitutes the 
main contribution of this paper. 

We consider two different attack scenarios: the "attack- and-run" scenario in which attackers 
only care about an immediate reward, and the "stay-with-attacks" scenario in which attackers care 
about the long-term reward. We first analyze the impact of attacks on honest SUs in the absence 
of attack-prevention mechanisms. Then, we propose two attack-prevention mechanisms: a direct 
punishment scheme that can effectively prevent attacks in both scenarios mentioned above, and 
an indirect punishment scheme that is easier to implement and effectively prevents attacks in the 
"stay-with-attacks" scenario. The key idea of both mechanisms is to discourage attackers from 
launching attacks by designing efficient attack detection and punishment strategies. 

The key results and organization of this paper are summarized as follows. 

• A spectrum- sharing model with collision penalty: In Sections HI] and Hill we introduce the 
concept of collision penalty, which requires the SUs to compensate a PU for collision 
in utilizing the spectrum. The collision penalty is designed to protect the PU's exclusive 
spectrum usage and encourage the PU's opening of its licensed spectrum to SUs. 

• Understanding cooperative attackers' optimal behaviors: In Section HVl we theoretically 
show that in the absence of attack-prevention mechanisms, attackers will utilize all spectrum 
opportunities exclusively, whereas honest SUs cannot transmit and may even suffer from 
the collision penalty caused by attackers (see Table H]). 

• Effective direct punishment: In Section |Vl we design a direct punishment mechanism that 
can detect attacks and punish the attackers. This requires an efficient way for the fusion 
center to directly punish SUs. The proposed mechanism can prevent all attacks in both 
"attack- and-run" and "stay-with-attacks" scenarios (see Table U)- We further show that a 
single attacker makes the network most vulnerable under this mechanism. 

• Effective indirect punishment: In Section |Vll we propose an indirect attack-prevention 
mechanism that is easy to implement when direct punishment is infeasible. The key idea is 
to terminate collaborative sensing when an attack is detected. The proposed mechanism can 
prevent all attacks if the attackers care enough about their long-term reward (see Table H]). 
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TABLE I 

Key Results for different attack scenarios 



Attack Scenarios 


Attack-and-run 


Stay-with-attacks 


No Punishment (SecUvTl 


Attacks hiappen and honest SUs always lose transmission opportunities 


Direct Punishiment (SeclVl) 


Completely prevent attacks 


Indirect Punislrment (SeclVIt 


Cannot prevent attacks 


If attackers focus on long-term reward: completely prevent attacks; 
If attackers focus on short-term reward: partially prevent attacks. 



Unlike the direct punishment, the presence of a larger number of attackers may make the 
network more vulnerable. 

A. Related Work 

There has been a growing interest in attack-resilient collaborative spectrum sensing in CRNs 
(e.g., (HI, [|23l - [|25l . [1311 ). Liu et al. [[51 exploited the problem of detecting unauthorized usage 
of a primary licensed spectrum. In this work, the path-loss effect is studied to detect anoma- 
lous spectrum usage, and a machine-learning technique is proposed to solve the general case. 
Chen et al. [[23l focused on a passive approach with robust signal processing, and investigated 
robustness of various data-fusion techniques against sensing-targeted attacks. Kaligineedi et al. 
[[25l presented outlier detection schemes to identify abnormal sensing reports. Min et al. [[8l 
proposed a mechanisms for detecting and filtering out abnormal sensing reports by exploiting 
shadow-fading correlation in received primary signal strengths among nearby SUs. Fatemieh et 
al. [|24l used outlier measurements inside each SU cell and collaboration among neighboring 
cells to identify cells with a significant number of malicious nodes. Li et al. in [[3TI detected 
possible abnormalities according to SU sensing report histories. 

Our work is different from existing approaches in three aspects. First, we consider cooperation 
among attackers, so the attacks are much more challenging to prevent. Second, unlike the previous 
work which focused on sensing data falsification attacks, we also consider the case where the 
attackers violate the fusion center's decision regarding spectrum access. Finally, our proposed 
attack-prevention mechanisms can easily prevent attacks without differentiating attackers from 
honest SUs. 
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Fig. 1. An illustration of cooperative spectrum sensing in cognitive radio networks: The figure shows a secondary network 
with N — 6 SUs including M — 2 malicious SUs (i.e., attackers). The SUs periodically perform spectrum sensing and report 
the local (binary) decisions to the fusion center (the solid arrows). The fusion center makes a final decision and announces it to 
the SUs (the dotted arrows). 



II. Preliminary 

A. CRN Model and Assumptions 

We consider an infrastructure-based secondary CRN, which consists of a single base station 
(or fusion center) and a set of SUs (or sensors). The fusion center coordinates SUs' collaborative 
spectrum sensing and their access to a licensed PU channel. We assume that the fusion center is 
maintained by a trusted network administrator and has high computation power. For collaborative 
spectrum sensing, all SUs (i) measure the primary signal strength on the same target channel, 
(ii) make local binary decisions on the presence or absence of the primary signal, and (iii) report 
the binary decisions to the fusion center IfTTTl . [|2T]| . Based on the reported sensing results, the 
fusion center makes a global decision and broadcasts this result to the SUs. 

There is a set of A/" = {1, ... , A^} SUs in the network, M of which are attackers as shown in 
Fig. [B We assume that there is at least one honest SU in the network, i.e., N — M > 1; otherwise, 
it would be infeasible to defeat attacks. The honest SUs fairly share the licensed channel among 
themselves when the channel is available to them (i.e., it is not being used by the PU). The 
attackers (i.e., malicious or compromised SUs), on the other hand, behave to maximize their 
own aggregate reward (e.g., achievable throughput) by manipulating their sensing reports so that 
the fusion center makes a wrong decision. In particular, we focus on the case that attackers 
can overhear all honest SUs' sensing reports to the fusion center before they collaboratively 
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manipulating their sensing results. We assume that attackers can communicate with each other 
(and thus know the number of attackers), while the honest SUs only communicate with the fusion 
center. The honest SUs do not have to be strategic, and they do not need to make decisions by 
considering other honest SUs and attackers' decisions. In other words, the honest users do not 
play a game with the attackers. 

To make the analysis tractable and obtain useful engineering insights, we make the following 
assumptions throughout the paper. 

Al. All SUs have the same detection performance in terms of primary false alarm (P/) and 
missed detection (P,„) probabilities 

A2. The PU's spectrum occupancy is the same for all Suj^ and is independent across 
different time slots c 

A3. All SUs have the same transmission rate in utilizing the channel. 

In Appendix IHl we relax both assumptions Al and A3 by studying SUs' heterogeneous detec- 
tion performances and heterogeneous transmission rates. We will focus on the most challenging 
case of single attacker (as shown in Theorem [2] and Observation [T] in Section jV]), and show that 
the direct punishment mechanism proposed in Section IVl can still prevent all attacks. Similarly, 
the effectiveness of the indirect punishment mechanism proposed in Section |VI] can also apply 
to the two heterogeneous scenarios. 

Regarding the PU's temporal channel usage statistics, we denote Pj as the probability that 
the channel is actually idle. Thus, the channel is busy with the probability 1 — P/. We assume 
that SUs (including attackers and fusion center) know the probability P/ before collaborative 
spectrum sensing as in [|7l, BH, [|3TI . This is reasonable if SUs and fusion center can collect 
PU's activity information from PU side and calculate P/ using various methods as in (Si . Such 

false alarm occurs when an SU detects an idle channel as busy, and a missed detection occurs when an SU detects a 
busy channel as idle. The detection performance depends on the SU's physical location (relative to the primary transmitter) and 
fading environment. 

^This is true when SUs stay relatively close compared to the PU's coverage area. 

''This assumption is frequently used in the literature (e.g., (7), |I9), 1311 ). and is reasonable when we try to approximate the 
case where PU's traffic changes fast (e.g., wireless microphones) and the time slot is relatively long. We may need to study 
the correlation between spectrum occupancies when PU's traffic changes slowly over time (e.g., TV transmitters). Analyzing 
the correlated case requires a much more complicated Markov decision process (MDP) model than the one that we used in 
Section IVTl and we consider this as a future direction. 
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Fig. 2. The behaviors of SUs in each time slot: Phase I: SUs sense and report, and then the fusion center fuses all SUs' 
reports and announces the result; Phase II: honest SUs transmit or wait depending on the fusion center's decision. Attackers do 
not need to follow the fusion center's decision. 



information collection is possible for SUs by examining PU's published historical activity report 
or purchasing the history report from PU directly. Actually, the precision of P/ does not affect 
SUs' decisions and our analytical results. This is because attackers and the fusion center make 
decisions based on their belief of Pj. 

B. Spectrum Sensing and Opportunistic Access Model 

We assume a time-slotted model for opportunistic spectrum access, as in Fig. [2l Such time- 
slotted channel access model has been widely assumed in the literature lfT9l . Il34l . Il43l . including 
the IEEE 802.22 standard draft ll37ll . Each time slot consists of two phases: 

• Phase I (Collaborative Spectrum Sensing): As shown in Fig. [H each SU performs sensing 
individually and makes a local binary decision (i.e., 0/1) on channel occupancy: 1 if it detects 
the PU's signal (i.e., busy), and otherwise (i.e., idle). All honest SUs truthfully report their 
sensing decisions to the fusion center. The attackers, on the other hand, overhear the sensing 
reports from the honest SUs before sending their own reports (which may be different from 
their actual local sensing decisions) to the fusion center. Based on the reports from all 
SUs (including the attackers), the fusion center makes a global decision and broadcasts 
it to all SUs in the network. We assume that the sensing reports and announcements are 
communicated via a dedicated and reliable control channel with no communication errors. 
Under this one-hop network configuration, the attackers can overhear the control channel 
and easily decode honest SUs' reports like the fusion center^ Also, even if we extend this 

^Note that end-to-end encryptions of reports sent from SUs to the fusion center could be too complicated and expensive to 
implement to prevent attackers' overhearing, as the control channel often can only support very low date rate transmissions. 
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one-hop communication network to a multi-hop network, it is still possible for attackers to 
overhear all honest SUs' reports as long as one attacker is located near the fusion center. 

• Phase II (Spectrum Sharing): If the fusion center announces the channel to be idle, then 
honest SUs will transmit in Phase II. If it announces the channel to be busy, then honest 
SUs will wait. The attackers may transmit or wait in both cases. We assume that SUs who 
transmit in Phase II equally share the transmission time. More advanced link scheduling 
and power control may improve the overall network performance in Phase II, but is not the 
focus of this paper. Let us normalize the total transmission rate of the channel to lO More 
specifically, X SUs transmitting together leads to 1/X rate for each involved SU by using 
TDMA mode. 

To summarize, the attackers can launch attacks in two different ways: (i) in Phase I by reporting 
falsified sensing results, and (ii) in Phase II by disobeying the fusion center's announcement. 

C. Collision Penalty 

In order to increase social welfare, the government regulatory bodies (e.g., FCC in the U.S. 
and Ofcom in the U.K.) are pushing new spectrum-sharing schemes to allow the coexistence of 
PUs and SUs. There are two main obstacles in persuading PUs to share their licensed spectrum 
bands: (i) PUs' fear of interference or service disruption caused by SUs, and (ii) lack of economic 
incentives to PUs for spectrum sharing. To achieve these goals while efficiently preventing 
attacks, we adopt the notion of "collision penalty", similar in [|40l . as an incentive mechanism to 
allow for an efficient PU-SU coexistence. When a collision happens, we assume that the PU will 
charge a collision penalty Cp to all SUs in the network. This collision penalty will compensate 
PUs for potential performance loss due to collisions^ The reasons why PU charges all SUs are 
as follows. 

• Complexity consideration: If the PU does not know each SU's transmission characteristics 
(e.g., modulation and coding schemes), it is impossible for him to check which subset of 
SUs cause collision. Also, the attackers can secure their transmissions (e.g., via MAC-layer 

*If the total transmission rate of the channel is r (7^ 1), we can change Cp and Ct (defined later in the paper) to Cp/r and 
Cb/r and all results will go through. 

'The penalty Cp can be in the form of monetary payments from SUs, or reduced transmission opportunities of SUs, or 
cooperative transmission by SUs to improve the PU's performance 1221 . 1401 . 
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encryptions) to avoid being detected and identified. Moreover, it is highly complex and 
time-consuming for the PU to identify which SUs cause usage collision. Such identification 
incurs a detection delay and is thus not desirable [|24l . ||25l . 

• Responsibility consideration: In the cooperative spectrum sensing, each SU contributes its 
sensing result to the final decision of the fusion center, and each regular SU follows the 
final decision. If a missed detection occurs, the PU should believe all SUs to be responsible 
for their imperfect sensing. Even some missed detection events are caused by attackers, it 
is impossible for the PU (without sensing reports) to identify and punish attackers only. 

Based on the above discussion, we define the PU's expected utility in one time slot as the 
sum of the PU's successful transmission rate and collision penalty collected from N SUs, i.e., 

UpuiCp) = (1 - l{Cp))V{rpu) + l{Cp)NCp, (1) 

where 7(Cp) is the collision probability of the PU's transmission due to SUs' aggressive access 
and is decreasing in Cp, rpu is the PU's transmission rate, and V{rpu) is PU's utility of achieving 
rate rpu. A larger Cp makes SUs more conservative in spectrum access and leads to a lower 
7(Cp). Hence, a larger Cp achieves a high successful transmission rate (in the first term in 
Eq. ([U), but may also lead to a low compensation from SUs (the second term in Eq. ([I])). 

III. Decision Fusion Rule 

Of the various decision fusion rules for collaborative sensing, we adopt the commonly used 
OR-rule. Ghasemi and Sousa [|38l showed that the OR-rule performs better than other rules 
in many cases of practical interest. Here we will discuss the OR-rule as a special case of the 
general n-out-of-A^ rule, and derive the conditions of Cp under which the OR-rule is theoretically 
optimal. We elaborate the decision fusion rule by focusing on the case in which all SUs are 
honest. 

At the end of Phase I in each time slot (see Fig. |2l), the fusion center collects a binary sensing 
report Di E {0 (idle), 1 (busy)} from each SU i E J\f, and makes a decision using the following 
ra-out-of-A^ rule llTll : 

"Ho (primary signal does not exist) : if Y^ieAf ^ 

(2) 

"Hi (primary signal exists) : if X^jga/" — 
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According to Eq. Q, the fusion center infers the channel to be busy "Hi when at least n-out- 
of-A^ SUs report 1 (busy); otherwise, it infers the channel to be idle T-Lq. The optimal selection 
of the threshold n depends on the system parameters and the reward functions of the SUs ||2T]| . 
When n = 1, we have the OR-rule. 

We show that when both of the following conditions hold, the OR-rule provides the highest 
reward for each SU within the family of n-out-of-A^ rules. 

1) When all SUs report (i.e., XlieAA-^* = 0)' ^^'^^ SU obtains a positive expected reward 
by sharing the spectrum opportunity after taking into account the false alarm and missed 
detection probabilities: 



Pr 



I idle] ^ A = o]^-Pr ( busy! J] A = J > 0. (3) 

The expected reward is the difference between the expected transmission rate and the 
collision penalty. Here, idle and busy denote the actual state of the channel instead of 
the fusion center's announcement (i.e., "Hq or ^i)- 
2) When at least one SU reports 1 (i.e., XlieAT-^* — every SU obtains a negative expected 
reward by sharing the spectrum opportunity: 

Pr I idle] ^ A > 1 ] ^ - Pr I busy] ^ A > 1 ] < 0. (4) 

We can write these two conditions more compactly by defining the following two notations: 

I _ ( _ \ _ Pi{l-Pf)''~^P} 

^""'^ ~ [""^"^ " 7 " w - Psr^'p^ + (1 - Pi){Pn.r-'{i - Pmr 

P^,, : = Pr ( busyl ^ A = k] = 1 - P^,. (6) 

Notice that P/r^ in Eq. ^ is decreasing in k, and P^ f, in Eq. Q is increasing in k. Thus, 
Eq. © is decreasing in J^ieAfP'i = k. This implies that with more SUs reporting 1, SUs have 
less incentive to transmit. We can summarize the range of Cp satisfying both Eqs. ([3]) and Q 
as follows. 

Theorem 1: At the fusion center, the OR-rule outperforms the other ra-out-of-A^ rules (n > 1) 
when the collision penalty Cp satisfies the following condition. 

P, l-l-PA^l P„P, , P, (l-PfV \ 
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1 23456789 10 
Number of SUs N 



Fig. 3. Cp range for the OR-rule's optimal application with (Pj, Pf,Pm) = (0.6, 0.08, 0.08). 



The lower-bound of Cp in Condition. I discourages the SUs from transmitting when at least 
one SU reports 1 (busy). The upper-bound of Cp in Condition. I encourages SUs to transmit 
when all N SUs report (idle). In the rest of the paper, we assume that Cp always satisfies 
Condition.I. 

In Region II of Figure [3l the OR-rule outperforms the other n-out-of-A^ rules with various 
number of SUs and collision penalty CpO As the number of SUs increases, the bounds on 
Cp increase. For a fixed Cp, more SUs lead to the increase of false alarm probability and the 
decrease of missed detection probability for the whole system. Then the SUs tend to strategically 
transmit more aggressively even when some SU(s) reports 1 (busy). To prevent this and ensure 
the optimality of the OR-rule, a higher value of Cp is required. The other decision fusion rules 
in Region III of Fig. [3] is not the focus of this paper. However, our analysis of the OR-rule can 
still apply to Region III, since in later analysis we consider all possible Cp values and do not 
restrict our attention to Condition.I. 

IV. Attackers' Behaviors Without Punishment 

In this section, we analyze the behavior of cooperative attackers when the system lacks attack- 
prevention mechanisms. The results in this section will serve as a benchmark for the proposed 
attack-prevention mechanisms in Sections |V] and |Vll 

We first define some useful notations. 

According to 802.22 WRAN standard, Pf and Pm must be less than 10% l37l . 



September 7, 201 1 



DRAFT 



11 

• State set S: A state s E S describes the local sensing decisions of the honest SUs and 
attackers: (EieA^\Ai A, EieM ^he size of set S is (iV-M + l)(M+l)|]The attackers 
know the exact state in a particular time slot by overhearing the honest SUs' reports to the 
fusion center. 

• Attackers' action set A: The action of an attacker m G is a tuple, (report to the 
fusion center in Phase I, spectrum access decision in Phase II), which has 4 possibilities: 
(idle, wait), (busy, wait), (idle, transmit), and (busy, transmit). Define a = {am,Vm E A4} 
as the action vector of all attackers, and A includes all possible a[J^ 

• Attackers' expected aggregate reward R{a,s): This reward depends on the state s and 
the attackers' actions a in one time slot. It denotes the difference between the attackers' 
aggregate transmission rate and their expected payment to PU due to usage collision in one 
time slot. 

For each state s, the attackers choose a to maximize the expected aggregate reward in a single 
time slot, i.e., 

max R{a, s). (8) 

a<=A 

We discuss the solution to Eq. ^ in the three following cases. 



A. All SUs sense the channel idle 

Proposition 1: Given the state s = (^i(zx\M ~ SieA^ -Dj = , the cooperative attack- 
ers' optimal actions are: at least one attacker adopts the action (busy, transmit) and the other 
attackers (if any) adopt the action (idle, transmit). That is, at least one attacker will report the 
channel busy in Phase I and all attackers will transmit exclusively over the channel in Phase II. 
The fusion center will announce a wrong decision Hi in this case. The attackers' expected 
aggregate reward is: 

R{a, s) = P^o - MP^^oCp > 0, (9) 

where the definitions of P/tq and Pjf q are given in Eqs. ^ and respectively. An honest SU 
does not transmit, but may suffer from the collision penalty caused by attackers and receives a 

'The value of Di can be either or 1, thus '}2itzj^\M ranges from to A'' — Af and X^igjM ranges from to M. 
'"Note that if all attackers have the same action sets, the fusion center may find it easier to identify them by checking their 
reports over time. 
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negative expected reward 

Proof. Given the state s = (^^^j^^j^ Di = 0, J2ieM = j in a time slot, the attackers may 
report truthfully and falsely in the Phase I: 

• If all attackers report (i.e., XlieM = ^ieM ^ ^'^ Phase I, then the announcement 
at the fusion center is "Ho and all honest SUs will transmit. Consider Mt (0 < Mt < M) 
attackers choosing to transmit rather than wait in Phase II. The attackers' expected aggregate 
reward in this time slot is: 

R,{s)(Mr) = Mr (f^.o ^_ j^^.J " MFi,„C,. 

which is increasing in Mt- Thus all attackers will transmit, i.e., = M. Then the 
attackers' expected aggregate reward is 

RM = M (P^o^ - Pn,oCp^ > 0, (11) 
due to Condition. I in (|7]). 

• If at least one attacker reports 1 (i.e., J2ieM — Phase I, then the announcement at 
the fusion center is T-Li and all honest SUs will not transmit. 

- Consider Mt (1 < Mt < M) attackers choosing to transmit. The attackers' expected 
aggregate reward is given by ([9l), which does depend on M and is larger than ([TT]) . 

- If all attackers wait, the attackers' expected aggregate reward equals 0, which is less 
than dH). 

By comparing (fTTI) and ^ with different actions, we conclude that at least one attacker will 
report 1 and steal the opportunity from honest SUs to utilize the channel exclusively. As a result, 
all honest SUs will not transmit but may suffer the collision penalty as in (flOl) . h 
Proposition [T] shows that an attack always happens when all SUs sense the channel idle. 

B. All honest SUs sense the channel idle, but some attacker(s) senses the channel busy 
Here we define the attackers' aggregate sensing result J2i£M 

Proposition 2: Given the state s = ^X]tgAr\A^ -^i = 0' J2ieM -Dj = M > 1^ , the cooperative 
attackers' optimal actions are as follows. 
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• If Pj^ < MP^ j^Cp, then at least one attacker adopts the action (busy, wait) and the other 
attackers (if any) adopt the action (idle, wait). This leads to a correct announcement "Hi 
(busy) at the fusion center. Since no one transmits, the attackers and the honest SUs all get 
zero reward, 

R{a, s) = Rhonestsuis) = 0. (12) 

• If P^ > MP^j^jCp, then at least one attacker adopts the action (busy, transmit) and the 
other attackers (if any) adopt the action (idle, transmit). This leads to a correct announcement 
Hi (busy) at the fusion center. Only attackers will transmit exclusively in Phase II, their 
expected aggregate reward is: 

R{a, s) = P^ ,j - MP^ j^Cp > 0. (13) 

An honest SU does not transmit in Phase II, but may suffer from the collision penalty 
caused by attackers' transmissions and receives a negative expected reward 

RhonestSui^) = —P^^Kfip < 0- (14) 

Proof. Given the state {YIi^zj^-^m = 0' Xltgx -Dj = M > 1 j in one time slot, the attackers 
may report truthfully or falsely in Phase I: 

• If all attackers report (i.e., YliiaM ^ Yl,i<^M ^ Phase I, then the announcement 
at the fusion center is "Hq and all honest SUs will transmit. Similar to the proof in Subsection 

IIV-A[ it is optimal for all attackers to transmit. Their expected aggregate reward is: 

1 

iV 

due to Condition. I in ©. 

• If at least one attacker reports 1 (i.e., XlieA^ — 1) Phase I, then the announcement at 
the fusion center is "Hi and all honest SUs will not transmit. 

- If at least one attacker transmits in Phase II, the attackers' expected aggregate reward is 
given by (fT3l) . Notice that (fT3l) is negative only if the collision penalty is high enough. 

- If all attackers wait in Phase II, the attackers' expected aggregate reward equals 0. 
By comparing (fT3l) and with different actions, we conclude that at least one attacker will 

report 1 to ensure that the correct announcement is made at the fusion center. But the attackers 
may transmit over the channel exclusively and the honest SUs may suffer from the collision 
penalty caused by the attackers with an expected reward in (fT4)) . ■ 
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Proposition [2] indicates that an attack only happens when the benefit of exclusive transmission 
is large enough to compensate the potential collision penalty for the attackers. 







) 
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C. Some honest SUs sense the channel busy 



nouncement at the fusion center is always correct with "Hi (busy), and the attackers' optimal 
actions are as follows. 

• K+M ^ K+hfip' ^'^^'^ ^^'^^ attacker can either take the action (busy, wait) or 
(idle, wait). Since no one transmits, the attackers and the honest SUs all get zero reward, 

i?(a, S) = RhonestSuis) = 0. (16) 

• If Pi; j^^M — K+M^P' '•^^'^ ^^'^^ attacker can either take the action (busy, transmit) or 
(idle, transmit). As only attackers will transmit in Phase II, their expected aggregate reward 
is: 

i?(a, s) = P^_^^^, - MPIj^^^C,. (17) 

An honest SU does not transmit in Phase II, but may suffer from the collision penalty 
caused by attackers' transmissions and receives a negative expected reward 

RhonestSuis) = —P^^K+M^P < (1^) 

Proof. Given the state s = (^J^ieXXM^i = K > ^^YliaM^i = ^ > oj, no matter what 
attackers report in Phase I, the announcement at the fusion center is always 1-Li, and the honest 
SUs will not transmit in Phase II. 

• If some attackers transmit in Phase II, the attackers' expected aggregate reward is given by 

• If all attackers wait in Phase II, the attackers' expected aggregate reward equals 0. Each 
honest SU's immediate expected reward also equals 0. 

By comparing (flTl) to 0, we conclude that the fusion center always makes the correct an- 
nouncement 1-Li regardless of the attackers' reports. However, the attackers may transmit over 
the channel exclusively in Phase II, and the honest SUs may suffer from the collision penalty 
caused by the attackers with expected reward in (fTSi) . ■ 

"Note that this state includes the case that all honest SUs sense the channel busy and (some) attackers sense idle. 



September 7, 201 1 



DRAFT 



15 



TABLE 11 

Attackers' optimal behaviors and honest SUs' behaviors 



Sensing Decisions 


Attackers' optimal behaviors 


Honest SUs' 
behaviors 




Attack by reporting falsely and transmitting exclusively 


Wait 




If Pn,k < MP^^jiCp, do not attack 
If Pn,k ^ MP^jiCp, attack by reporting truthfully and transmitting exclusively 


Wait 



We summarize the results in Propositions [HlB] as in Table HIl Without any attack-prevention 
mechanism, the attackers will utilize the spectrum opportunities exclusively, whereas the honest 
SUs will never transmit regardless of their sensing decisions. What is worse, the honest SUs 
may suffer from the collision penalty caused by the attackers. 

Note that our current analytical results focus on one time slot, where the attackers want 
to maximize their expected aggregate reward in the current time slot (i.e., the "attack- and-run" 
scenario). Since attackers' behaviors are independent over time slots, the above analytical results 
also hold for the "stay-with-attacks" scenario. 

Given many possible attack scenarios in Section |lVl it is hard to identify attackers based on 
their report orders and results in Phase I. The reasons are as follows. 

• First, different SUs may have different sensing times to guarantee certain precision of 
channel detection, and thus it is not possible to force everyone to report at the same time. 
This means that there is always a last reporter. If all SUs are honest, then the last reporter 
is not an attacker. Unless the fusion center is sure that there exists at least one attacker, it 
is hard to tell that the last reporting SU is an attacker. 

• Second, even the fusion center is aware of attacker(s), it is still difficult to punish attackers 
effectively since the attackers (aware of such identification) can strategically change to report 
not the last. 

- When the attackers overhear some honest SU(s) reporting 1 (busy) at the beginning, 
they can report immediately after their sensing and do not need to wait for the last 
honest SU's report. In this case, the fusion center's decision is correct ("Hi) no matter 
attackers' manipulate their reports or not. But the attackers can still attack (i.e., violating 
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the fusion center's decision and transmit) as shown in Case C in Section IV. In this 
case, the attackers still need to overhear all honest SUs' reports. 
- When the attackers overhear many honest SUs' reporting (idle), they may not wait 
for the last honest SU's report and can still manipulate their reports. In this case, such 
identification still hurts honest SU(s) and the attackers still perform attacks although 
they lose a little bit of information. 
It should also be noted that it is possible for the fusion center to monitor the control channel 
to check who are the attackers by exchanging their sensing results secretly in Phase I. It is also 
possible that the fusion center can monitor the PU's licensed later to see who disobey its decision 
to transmit exclusively. But the above attack identifications require the fusion center to know 
at least all SUs' coding and modulation schemes. Even the fusion center has such information, 
the attackers can still change their coding and modulation schemes (e.g., as some honest SUs), 
or secure their communication to exchange sensing results in Phase I and their transmission in 
Phase II, e.g., via MAC-layer encryptions, to avoid being identified by the fusion center. 

The above discussions illustrate why we are interested in designing attack-prevention mech- 
anisms without attack identification. 

V. Attack-Prevention Mechanism: A Direct Punishment 

In this section, we consider the case in which the fusion center can directly charge a punishment 
to the SUs when attacks are identified. We focus on the "attack- and-run" scenario in a single 
time slot. The analysis also applies to the "stay-with-attacks" scenario as in Section |lVl With 
the proper choice of punishment, the proposed mechanism ensures that no attack will happen 
and no one will be punished. 

Let us denote the direct punishment as Cb, which is different from the collision penalty Cp 
introduced in Section Ill-Ci The fusion center will only charge the punishment to all SUs when 
the PU detects an attack. Let us consider the following two scenarios: 

• When the announcement at the fusion center is Hi (busy) in Phase I and a collision happens 
in Phase II, the fusion center knows that an attack happens (as honest SUs will not transmit 
in Phase II). In this case, all SUs are charged a direct punishment Ch by the fusion center 
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Number of attackers (M) 



Fig. 4. Direct punisiiment tiiresiiold Cl^{M) for different M and iV cases with {Pi, P/, P„, Cp)=(0.6, 0.08, 0.08, 6e + 10). 



(in addition to the collision penalty Cp charged by the PU)ll^ 
Note that when the announcement at the fusion center is Hq (idle) in Phase I, no direct 
punishment will be triggered even if there is a collision in Phase II. This is because attackers 
will not share the spectrum access opportunity with honest SUs as in Proposition [H and such 
collision can only the result of the missed detections of spectrum sensing. 

The effectiveness of the attack-prevention mechanism depends on the choice of the punishment 
Cf,. Theorem [21 shows that a large enough Cb can prevent all possible attacks. 

Theorem 2: For M attackers in the network, there exists a threshold Cl'^{M), i.e., 



(19) 

such that any value Cb > C^^(M) can prevent all attack scenarios described in Section HVl 

The proof of Theorem [21 is given in Appendix \M Next, we examine how the numbers of 
honest SUs and attackers affect the threshold Cl^{M). 

Observation 1: Cl^{M) is decreasing in the number of attackers M and increasing in the 
number of honest SUs — M. If the fusion center does not know the number of attackers, it 

'^Tlie way for tfie fusion center to realize the punishment Cb is similar to the way to realize the collision penalty Cp. See 
footnote [7] for details. 
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1 2 3 4 5 6 7 
Number of Attacker (M) 



9 10 



Fig. 5. Direct punishment threshold Cl''{M) for different M and Pi cases with (P/, Cp, iV)=(0.08, 0.08, 6e + 10, 11). 



N 



13 



should set the threshold to be C^'^(l) = max a,/>i C'^'^(M) to prevent all attacks. 

igure |4] shows the value of threshold Cf'{M) as a function of M for different values of 
When the number of attackers increases, the total penalty to the group of attackers also 
increases when an attack is confirmed (while the total transmission rate does not change), which 
discourages the attacks to happen. 

Figure |4] also shows that Cf'{M) increases with the number of honest SUs N — M for any 
fixed M. This is because the more honest SUs' sensing reports are overheard by the attackers, 
the more accurately the attackers can estimate the actual channel state, and thus more likely the 
attackers will launch an attack. As a result, a higher Ch is required to prevent attackers from 
manipulating their sensing reports. Thus, the single attacker scenario (i.e., M = 1) is the most 
challenging case for this attack-prevention mechanism. 

Observation 2: The threshold C^^(l) is increasing in the idle probability Pj and non-increasing 
in the collision penalty Cp. 

Figure [5] shows that the value of threshold Cl^{M) is increasing in the idle probability P/. 
A larger Pj means a higher channel availability, and thus encourages the attackers to launch an 
attack so that they can exclusively utilize the channel more frequently. A larger Cp discourages 



''since Pf and Pm must be less than 10% in 802.22 WRAN standard draft, thus the probability to trigger direct punishment 
is very small under this choice of Pf and Pm. As a result, high Cl^{M) = Cl^{M)/r value is determined in Fig.|4]to eliminate 
the attack benefit. 
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the attackers from accessing the channel due to the possibility of paying a large collision penalty. 

VI. Attack-Prevention Mechanism: An Indirect Punishment 

The direct punishment scheme may be difficult to enforce for certain types of networks due 
to practical constraints, such as implementation overhead and complexity. For example, if the 
direct punishment is in the form of monetary payments from SUs to the fusion center, the fusion 
center needs to have reliable channels to collect and monitor such payments [[22|. Il39ll . In this 
section we propose an indirect punishment scheme that can effectively prevent attacks in the 
"stay-with-attacks" scenario as long as the attackers care enough about future rewards. The key 
idea is to terminate collaborative sensing once the fusion center detects an attack, which forces 
the attackers to rely on their own sensing results in the future. This prevents attackers from 
overhearing honest SU sensing reports, and results in an increase in missed detection probability 
for attackers. Therefore, such indirect punishment will reduce the attackers' incentives to attack. 

The indirect punishment works as follows: 

• When the fusion center announces Hi (busy) in Phase I and a collision happens in Phase 
II, th^indirect punishment is triggered and there is no collaborative sensing in future time 
slots 

Note that when the fusion center announces T-Lq (idle) in Phase I, no indirect punishment will 
be triggered even if there is a collision in Phase II. 

Similar to the direct punishment mechanism in Section |Vl no indirect punishment will be 
triggered if all SUs behave honestly. The effectiveness of the indirect punishment depends on 
the attackers' performance when they are isolated from the honest SUs. 

In the rest of the section, we make the following assumption: 

A4 is derived from P/q — Pi^Cp < 0, which implies that a single SU will not transmit based on 
its own sensing decision (since it can be quite unreliable after the collaborative sensing breaks 
down) even without interference from the other SUs. A4 is quite mild. When the number of 

'*The fusion center can achieve this by broadcasting to all SUs that there is no need to report local sensing decisions in the 
future. 
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SUs is reasonable (i.e., N > 7), Condition. I in Eq. ([7]) directly guarantees the satisfaction of 
A4 in Eq. (|20|) . Note that A4 only applies to this section. 

To analyze the attackers' dynamic decisions in the long-term "stay-with-attacks" scenario, we 
formulate the problem as a Markov decision process (MDP) fl^. More specifically, we consider 
an infinite horizon Markov decision process (S' , A' , P, R), where the group of cooperative 
attackers is the only decision-maker (collectively) over time. 

• State set S': A state s E S' describes the attackers' knowledge of honest SUs' sens- 
ing decisions, their own sensing decisions, and whether the indirect punishment is trig- 
gered: (Eie7V\>i A, Eie>! A, Punishment). When Punishment = off, Y.ieAf\M^i = 
J2i£j\f\M -^^^ When Punishment = on, J2ieAf\M^i ~ Unknown as the attackers do not 
know the honest SUs' sensing decisions. The size of set S' is [(N—M +l)(Af+l) + (Af +1)]. 
The attackers know the state during each time slot. 

• Attackers ' action set A': The action of an attacker m G is a tuple: (report to the fusion 
center, spectrum access decision). When the indirect punishment is not triggered, there are 
four possible actions: (idle, wait), (busy, wait), (idle, transmit), and (busy, transmit). When 
the indirect punishment is triggered, an attacker's action can be {N/A, transmit) or {N/A, 
wait), where N/A means that the attackers do not report. We define a = {am,Vm E Ai} 
as the action vector of all attackers and A' contains all feasible values of a. 

• Transition probability P{a, s, s'): The transition probability that actions a in a state s at time 
slot t will lead to state s' in time slot t + 1 is P(a, s, s') = Pr(st+i = s'\st = s,at = a). 
This depends on both state s and actions a, and is independent of time t. 

• Attackers' expected aggregated reward R{a, s): The attackers' received reward after taking 
actions a in state s of a time slot. 

Compared to the reward in the current time slot, the attackers may value future rewards less. 
This can be captured by a discount factor 5 E (0, 1). We further define a stationary policy u 
as a mapping between the set of states S' to the action set A'. In other words, a policy defines 
what action to take in each possible state. The attackers' objective is to choose a policy u from 
policy set U to maximize the long-term expected aggregate reward: 

oo 

max^5*i?(w(s),s), (21) 
" t=o 

Let us denote the attackers' optimal long-term expected aggregate rewards by LR^ and LR^^ 
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if they behave honestly and dishonestly, respectively. 

Since attackers' behaviors and rewards before and after the indirect punishment are quite 
different, we need to study them separately. Here we first consider the attackers' behaviors 
before the punishment. Let us consider the case where at least one SU senses the channel busy, 
i.e., J^ieAf^i = K > 1. The attackers' optimal behaviors can be classified into two cases: 

• Non-aggressive Transmission: The attackers will not attack for any K > 1, which is true if 

Case.NT ^ - MP^^^Cp < 0, (22) 

where the attackers' exclusive transmission opportunity does not compensate their collision 
penalty. 

• Aggressive Transmission: The attackers may attack even if A" > 1, which is true if 

Case.AT ^ - MP^^^Cp > 0. (23) 

In the rest of this section, we focus on Case.NT with M > 1 attackers. The discussion for 
Case.AT with M > 1 is given in Appendix O 

We analyze the conditions under which attacks can be completely prevented via an indirect 
punishment. We first need to understand the attackers' performance degradation once the indirect 
punishment is triggered. Since the attackers are cooperative, they can always exchange sensing 
information among themselves. Depending on whether the attackers will transmit after the 
indirect punishment, we have two cases: 

• Weak Cooperation: The attackers will not transmit even when all attackers sense the channel 
idle, 

Case.WC: Pij^^ - MP^j^oCp < 0. (24) 

This means that the attackers feel that their own sensing results are not reliable enough (with 
a high missed detection probability). Case.WC also implies that the attackers will definitely 
not transmit if one or more attackers sense the channel busy. Due to assumption A4, the 
reward in Eq. (|24|) is an increasing function of the number of attackers M. Then we can 
also write Eq. (|24)) as an upper bound of M, i.e., Case.WC corresponds to a small number 
of attackers M. 

• Strong Cooperation: The attackers will transmit when all attackers sense the channel idle, 

Case.SC : P/^ q " MP^j ^Cp > 0. (25) 
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5fc{M) = ( 1 + iPii^-Psf7!-i^~P')iP^^^ fC,) - - PfY'ii - (1 - P7)(P^)'^^Cp) 1 - P, 



TI-W P' \^-Pf 



(29) 



This means that the attackers feel that their own sensing results (collectively) are accurate 
enough (with a low missed detection probability) even taking the collision penalty Cp into 
consideration. We can also write Eq. (|25l) as a lower bound of M, i.e., Case.SC corresponds 
to a large number of attackers M. 
Obviously, it is more challenging to prevent attacks in Case.SC than Case.WC. However, we can 
show that in Case.SC the attackers' expected aggregate reward in one time slot with punishment 
triggered is always less than their reward when they always behave honestly. In other words, as 
long as the attackers care enough about future reward (i.e., the discount factor 5 is high enough), 
we can still prevent attacks even in Case.SC (and thus in Case.WC as well). 

Lemma 1: The attackers' optimal long-term expected aggregate rewards in Case.WC and Case.SC 
are 



"sc = Pr A = j (P^o^ - PN,oC)j (26) 



and LR^^ in 



LPf^ = LP^^ + —^Pr I ^ Di = 




PN,oPriE.eM A = 0)(Pi,,o - MPf,,oC,) 



1 - S{Prij:.eM A > 0) + Pr (E.,A. A = 0)P^o) ' 

(28) 

Here the superscripts and "D//" indicates honest and dishonest behaviors of attackers, 
respectively. 

The proof of Lemma [U is given in Appendix |Dl where we can show that LR^/^ < LR^^j 
and LRgQ < LRg^ when S goes close to 1. This leads to the following result. 

Theorem 3: The indirect punishment can prevent all attack in "stay-with-attacks" scenario if 
the discount factor 5 satisfies the following condition: 

• Weak cooperation fCase.WCj.' for any 1 < M < A^, we need 5 > S^^fj{M) where 

5^ (M) = ^. (30) 



jFw P' V~Pf 
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-B- Number of all SUs: N=1 5 
-e- Number of all SUs: N=1 4 
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Fig. 6. Discount factor thiresiiold 5fc{M) with {Pi, Pf,P^,Cp) = (0.6, 0.08, 0.08, 3e + 18). 

• Strong cooperation fCase.SCj; for any 1 < M < A^, we need 5 > 6gQ{M) where 6% is 
given in Eq. (|29l ). 

If the fusion center does not know the number of attackers, M, it can choose 5 > maxo<M<Ar ^wci^) 
and 6 > maxo<A/<Ar 5fj^(M) for the two cases, respectively. 

Ahhough it is not shown in Theorem [3l we want to mention that the indirect punishment 
can still partially prevent attacks even 5 is less than the discount factor threshold. Intuitively, 
attackers do not want to trigger indirect punishment and lose the opportunity to overhear honest 
SUs' sensing results. Thus they will behave more conservatively compared to the case with no 
indirect punishment. For example, if some SUs' sensing results indicate a busy channel state, 
the attackers will not attack to trigger the long-term punishment. 

We have the following interesting observations. 

Observation 3: {Impact of network size:) Both 5^^fj{M) in Case.WC and 5^'^(M) in Case.SC 
are increasing in the number of the honest SUs — M. Threshold 5^(^(M) is decreasing in 
the number of the attackers M, while S^g^^M) is increasing in the number of the attackers M. 

Figure [6] plots Sf^^M) as a function of N and M in Case.SC. The corresponding result in 
Case.WC can also be obtained based on Eq. (l30l) . 

With more honest SUs N — M, attackers have a less incentive to share the spectrum with 
honest SUs in the long-term and a higher incentive to attack and transmit exclusively. Thus a 
higher 5 is needed to prevent attacks. 
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A larger number of attackers M has two effects: (a) a higher total collision penalty (whenever a 
collision happens), and (b) attackers' better estimation of channel condition (once the punishment 
is triggered). It turns out that effect (a) dominates in Case.WC and effect (b) dominates in Case.SC, 
which explains why the 6 threshold decreases in M in Case.WC and increases in M in Case.SC. 
In Fig. [6l the most attack-vulnerable case happens when almost all SUs are attackers (M — )■ A^), 
in which case LR§^ LR^^ and Sf^M) in dlH) is close to 1. 

Observation 4: (Impact of collision penalty Cpi) ^^^'(M) in Case.WC is increasing in the 
collision penalty Cp, while 5*gQ{M) in Case.SC is decreasing in Cp. 

In Case.WC, the collision penalty Cp only affects the time slots before the punishment is 
triggered. A higher Cp means a smaller long-term expected reward as a conservative honest SU 
(by comparing LR^q in Eq. (|26|) to LR^^ in Eq. (ITTI)). and thus more incentives to attack. In 
Case.SC, a larger value of Cp hurts the reward of attackers more after punishment than before 
punishment. This is because the transmission probability before punishment is Pr{J2ieAf ~ ^) 
(i.e., all SUs sense idle), which is smaller than the transmission probability after punishment 
Pr{J2ieM ~ ^) attackers sense idle). Thus a larger Cp discourages the attacks in 

Case.SC. 

VII. Conclusions and Future Work 

Collaborative spectrum sensing is vulnerable to sensing data falsification attacks. In this paper, 
we focused on a challenging attack scenario in which multiple cooperative attackers can overhear 
the honest SU sensing reports, but the honest SUs are unaware of the existence of attackers. We 
first analyzed all possible attack scenarios without any attack-prevention mechanisms. In this 
case, we showed that honest SUs will have no chance to transmit and may even suffer from the 
collision penalty charged by the PU. Then, we proposed two attack-prevention mechanisms with 
direct and indirect punishments. Both mechanisms do not require identification of the attackers. 
The direct punishment can effectively prevent all attacks in both "attack- and-run" and "stay- 
with- attacks," and the indirect punishment can prevent all attacks in the long-run if the attackers 
care enough about their future rewards. 

There are several possible ways to extend the results in this paper. 

• First, we can consider the case where PU's traffic changes slowly over time (e.g., TV 



September 7, 201 1 



DRAFT 



25 



transmitters), where we need to consider the correlation between spectrum occupancies 
over different time slots. In this case, we should to use a much more complicated MDP 
model than the one in Section |VIl 

• Second, we can study the case that the fusion center knows all SUs' transmission character- 
istics (e.g., modulation and coding schemes) and can monitor attackers' sensing information 
communication in Phase I and attackers' transmissions in Phase II. In this case, the fusion 
center may be able to identify attackers. However, the attackers can change their modulation 
and coding schemes (e.g., as some honest SUs) or secure their transmissions (via MAC-layer 
encryptions) to avoid being identified. 

• Third, we can study the denial-of- service attacks. Throughout this paper, we consider that 
attackers are rational and are only interested in maximizing their own rewards. For denial 
of service attacks, however, the attackers' objective is to let honest SUs lose transmission 
opportunities or break down the effectiveness of collaborative sensing. 

• Finally, we can consider imperfect control channel between SUs and the fusion center (e.g., 
some SUs receive false announcement from the fusion center). In that case, an indirect 
punishment can be triggered due to channel communication errors instead of attacks. We 
need to design the indirect punishment which will resume collaborative sensing after a 
period of time (instead of an infinitely long punishment). 

Appendix 

A. Proof of Theorem |2] 

By examining different possible states, we can derive the attackers' optimal behaviors. Then 
in response to the attackers' optimal behaviors, we find the proper value of direct punishment 
Cfe to prevent all attacks. 

1) (E ieJV\M ^-i- ~ ^' SieM ~ When the sensing results are all 0, then the attackers 
may report truthfully or falsely in the Phase I: 

• If all attackers report in Phase I, then the announcement at the fusion center is "Hq (idle), 
and all honest SUs will transmit. It is easy to check that all attackers will transmit with 
positive expected aggregate reward in (fTTI) . 

• If some attackers report 1 in Phase I, then the announcement at the fusion center is "Hi 
(busy) and all honest SUs will not transmit. 
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- If some attackers with number 1 < Mt < M choose to transmit, the attackers' expected 
aggregate reward is: 

Ra{s) = P^^, - MP^,{Cp + C). (31) 

which does not depend on and may not be larger than (fTTI) . 

- If all attackers wait, the attackers' expected aggregate reward equals and is less than 

m. 

To prevent attacks in this state, high value of Cb should be set to make (fTT)) larger than (|3T1) . 

€Af\M ~ ^' EiGX Di — M > I): The attackers may report truthfully or falsely in 

Phase I: 

• If all attackers report in Phase I, then the announcement at the fusion center is 1-Lq (idle), 
and all honest SUs will transmit. It is easy to check that all attackers will transmit and their 
expected aggregate reward is given by (fTSi) which is negative. 

• If some attackers report 1 in Phase I, then the announcement at the fusion center is 1-Li 
(busy), and all honest SUs will not transmit. 

- If some attackers with number 1 < Mr < M choose to transmit, the attackers' expected 
aggregate reward is: 

Ra{s) = Pj,,-, - MPl^,{Cp + a). (32) 

which does not depend on and may be negative. 

- If all attackers wait, the attackers' expected aggregate reward equals which is larger 
than (fBl) . 

To prevent attacks in this state, high value of Ci, should be set to make (l32l) smaller than 0. 

3) (EieJVXM^i = K > 1, E^eMD^ = M e {0,...,M}j.- When at least one honest SU's 
sensing decision is 1, then no matter what attackers report in Phase I, the fusion center always 
makes correct announcement Tii (busy). All honest SUs will not transmit in Phase n. 

• If some attackers with number 1 < Mt < M choose to transmit in Phase II, the attackers' 
expected aggregate reward is: 

Rai^) = Pn,K+M ~ ^N,K+m(^P "I" ^b)^ (33) 

which does not depend on Mt and can be positive or negative. 

• If all attackers wait in Phase II, the attackers' expected aggregate reward equals 0. 
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To prevent all attacks in this state, high value of should be set to make ( fTTl ) smaller than 0. 

Then we can summarize the requirement of Cb to prevent attacks in all possible states in 
Theorem [2l 

B. Relaxation of Assumptions Al and A3 

Here we will relax Assumption Al and consider the general case where SUs have heteroge- 
neous detection performances (i.e., different false alarm probabilities Pf and missed detection 
probabilities Pm) and transmission rates. We are interested to know whether our attack-prevention 
mechanisms (with some minor modification of system parameters) can still apply, and how to 
change punishments to attackers. Due to the page limit, we only examine the direct punishment 
here. The effectiveness of the indirect punishment can be shown similarly. 

Observation \T\ in Section |V] showed that the single attacker scenario (M = 1) is the most 
challenging attack scenario for the direct punishment mechanism. Thus we will focus on the 
single attacker scenario to check the effectiveness of this mechanism. We label the attacker as 
the A^th SU, and we denote its false alarm probability and missed detection probability as Pj a 
and Pm.A, r esp ectively. For the ease of analysis, we still consider all honest SUs having the same 
Pf and Pmk 

We denote the attacker's transmission rate as ta, and an honest SU i < N has a different 
transmission rate rj. When all SUs share the same transmission opportunity using TDMA mode, 
the attacker obtains a data rate of ta/N. 

First of all, we need to change the two notations in ^ and ^ as follows. When < A; < A^— 1, 
honest SUs sense the channel busy (^f-i^ Di = and the attacker senses idle (Djy = 0). The 
condition probability that the channel is actually idle is 



PkikHio) ■■ = Pr[ idle I J2D, = k,D^ = 



i=l 

Pr(l- Pf)^-'-''Pf{l-Pf^A) 



Pr{l - P^)^-l-^P;(l - Pf,A) + (1 - Pl){l - PrafP^-^-'^PmA 



(34) 



'^If we consider different detection performances for honest SUs, tfie analysis becomes more complicated without adding 
more meaningful insights. Intuitively, as honest SUs' overall detection performance becomes more precise, the attacker can 
predict the channel state more precisely by overhearing honest SUs' reports. 
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The conditional probability that the channel is actually busy is 

When < A; < — 1, honest SUs sense the channel busy Di = and the attacker 

also senses busy (Djy = 1). The conditional idle and busy probabilities are respectively 

/ N-l N 

PkikHW ■■ = Pn idle I Y,D^ = k,DM = l 



1=1 



Pj{l - Pj^-^-'^P^Pf^A + (1 - Pl){l - PmYP^~^-'^{l - P. 

and 



(36) 



-PjV,(fc)+(l) — 1 ~ PN,{k)+{l)- (37) 

With the help of above notations, we can similarly analyze how the direct punishment works 
and how to determine the punishment as in Section |Vl 

Theorem 4: For the single attacker in the network, there exists a threshold 



CT-r^[^) (38) 
such that any value Cb > Cf' can prevent all attack scenarios described in Section |IVl 

Proof. By examining different possible states as in Section |IVl we can derive the attacker's 
optimal behavior. Then in response to the attacker's optimal behavior, we find the proper value 
of direct punishment Cb to prevent all attacks. 

1) State s = (j2f=~i^ A = 0,Dn = o).- When the sensing results are all 0, then the attackers 
may report truthfully or falsely in the Phase I: 

• If the attacker reports in Phase I, then the announcement at the fusion center is (idle), 
and all honest SUs will transmit. It is easy to check that the attacker will also transmit and 
receive a positive expected reward 

Ra{s) = -P/(r^(o)+(o)^ ~ Pn,{0)+{0)^P > 0' (39) 

which is similar to ([3]). 

• If the attacker reports 1 in Phase I, then the announcement at the fusion center is Jii (busy) 
and all honest SUs will not transmit. 

- If the attacker chooses to transmit, its expected reward is 

Ra{s) = P^(o)+(o)'"A — -P/v,{o)+{o)(Cp + Cb), (40) 
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which may or may not be larger than (|39l) . 
- If the attacker waits, its expected reward equals and is less than (l39l) . 
To prevent attacks in this state, a high value of Cb should be set to make ( |39| ) larger than (l40l) . 
In other words, 

a > f i^r" (41) 



;i - Pl)Pm,A \ Pm J N 

where we denote the term in the right-hand side as threshold Cl^^. It is easy to check that Cl^^ 
is decreasing in both Pf^A and Pm,A, but is increasing in Pj, N, and ta- 

2) State s = Di = 0, Dn = ■' The attacker may report truthfully or falsely in 
Phase I: 

• If the attacker reports in Phase I, then the announcement at the fusion center is "Hq (idle), 
and all honest SUs will transmit. It is easy to check that the attacker will also transmit and 
its expected reward is 

Ra{s) = PN,(o)+(i)j;^fA — PN,(o)+(i)Cb, (42) 
which is negative as required by the optimality of OR-rule. 

• If the attacker reports 1 in Phase I, then the announcement at the fusion center is "Hi (busy), 
and all honest SUs will not transmit. 

- If the attacker chooses to transmit, its expected reward is 

-Ra(s) = -Pjv,(o)+(i)''^A — -P/v,(o)+(i)(Cp + Cb), (43) 
which may or may not be negative. 

- If the attacker waits, its expected reward equals which is larger than (|42|) . 

To prevent attacks in this state, a high value of Cb should be set to make (l43l) smaller than 0. 
This gives to 

where we denote the term in the right-hand side as threshold Cl^'^. 

3) State s = (j2f=i^ = K >1,Dn = M e {0, 1}).- When at least one honest SU's 
sensing decision is 1, then no matter what the attacker reports in Phase I, the fusion center 
always makes correct announcement Hi (busy). All honest SUs will not transmit in Phase II. 

• If the attacker chooses to transmit in Phase II, its expected reward is 

Ra{s) = PN^{K)+{MfA - Pn,{K)+{M)(Cp + Cb), (45) 
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which may be positive or negative. 
• If the attacker waits in Phase II, its expected reward equals 0. 
To prevent all attacks in this state, a high value of Cb should be set to make (l45l) smaller than 
0. This gives to 

" l-P/V Pm J [{I - Pf)il - Pm) J V Pm,A J [l-PmAj " 

(46) 

which is maximized when K = 1 and M = due to Pm < 0.1 and Pf < 0.1 as explained in 
footnote \T3\ To prevent all attacks in this state, we should require 

^'^1-pA Pm ) l(l-P,)(l-P™)J Pm,A ^^^^ 

where the right-hand side is denoted as threshold Cl'^^. 

To summarize, the requirement of Cb to prevent attacks is Cb > max {Cl'^^, Cf^"^, C^^^ in all 
possible states. It is easy to check that Cf^^ > Cf''^ and Cf^^ > C^^, and we can conclude the 
results in Theorem HI ■ 

Observation 5: Cf^ is increasing in both idle probability Pi and the number of honest SUs 
A^- 1. 

A larger P/ means a higher channel availability, and thus encourages the attacker to launch 
an attack so that it can exclusively utilize the channel more frequently. Also, as the honest SU 
number A^ — 1 increases, the more honest SUs' sensing reports are overheard by the attacker. The 
attacker can estimate the actual channel state more accurately, and it is more likely to launch an 
attack. 

Observation 6: Threshold Cf^ is increasing in the attacker's rate r^, and it is decreasing in 
the attacker's false alarm probability Pf^A and missed detection probability Pm,A- 

As the attacker's rate increases, it values the exclusive transmission opportunity more. The 
attacker has a higher incentive to attack, and a higher C^' is required to prevent the attack. 

Figure |7] shows the threshold C^'* as a function of the attacker's false alarm probability Pj ^ 
and missed detection probability Pm,A- Intuitively, as Pf^A increases, the attacker has higher 
probability to overlook the channel access opportunity and it has less incentive to launch an 
attack. As Pm,A increases, the attacker has a higher probability to trigger direct punishment and 
thus it is more conservative to attack. In both cases, the fusion center can to announce a lower 
Cb to prevent all attacks. 
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Fig. 7. Direct punishment tliresliold Cl^ for different P/,a and P™,a with (P/, Pm,N)=(0.05, 0.05, 11). 



C. Attack Prevention in Case. AT of Section [Wl 

Section |Vl] focuses on Case. NT. Here we discuss Case. AT with multiple attackers (M > 1). Ta- 
ble HI] in Section ITVl shows that with no punishment, the attacker may still obtain a positive reward 
by transmission even when at least one SU senses the channel busy in Case. AT. Next, we discuss 
the attacker's action u{s) under any given state s = ^XlieAr\A4 ieM Punishment j . 
. State s = (j2i(zj^\M Di, YlieM ^ of f Before the indirect punishment is triggered, 
the attackers may purposely change their attack behaviors (comparing to the actions in 
Table HI] in Section HVT) in some states to deter punishment. 
- s = (0,0, off).- at least one attacker still chooses the action (busy, transmit) as in 
Table HI] In this state, the attackers obtain the largest expected aggregate reward P/r q — 
MP^^Cp and induces the smallest missed detection probability P^q to trigger pun- 
ishment. 

ieAf\M^i — 0' J2i£M^i — ^' off)-' the attackers may choose not to attack 
even if they can obtain a positive expected aggregate reward in current time slot, which 
is different from no punishment scenario in Table HI] This is because that the attackers 
fear to trigger the long-term indirect punishment, and they will attack only when missed 
detection probability to trigger punishment is low with small J2ieAf^i- 
• State s = (Unknown, J2i£M ~ '-'^) •' ^^er the indirect punishment is triggered, the 
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attackers know their own sensing results and will choose the actions as follows. 

- Weak Cooperation (Case.WCj; the attackers will choose the action (N/A, wait) even if 
all attackers sense the channel idle. Otherwise, they will receive a negative expected 
aggregate reward P/^ - MP^^Cp. 

- Strong Cooperation fCase.SCj.' the attackers will choose the action (N/A, transmit) 
when all attackers sense the channel idle. Even if at least one attackers senses the 
channel busy, they will still choose the action (N/A, transmit) if P[j > MP^ k^p- 

Lemma 2: The attackers' optimal long-term expected aggregate rewards in Case.WC and Case.SC 
are 



Y^^^ {Y. = o) (^io^ - Pn,oCp^ , (48) 



by behaving honestly, and 

LP^^ = max — — ^ TT^^^T^ (49) 



6 P^,kPr{E^eM A = 0) {Pko - MP^.C,) 
+ 1 - 5 1 - 5(1 - ELo^-(E.e.^ A = A:)P#,,) J' 
by behaving dishonestly. Here the superscript "P" indicates honest behaviors of attackers and 

"DH" indicates dishonest behaviors. We denote the value of z that achieves the maximum of 

(l49l) in Case.WC or (l50l) in Case.SC as z*. The attackers' optimal policy u* has a threshold 

structure as follows. 

• If J2ieAf — ^Q^si one attacker will take the action (busy, transmit) before the 
indirect punishment is triggered. After the indirect punishment is triggered, 

- Case.WC: all attackers will take the action (N/A, wait). 

- Case.SC: all attackers will take the action (N/A, transmit) if Plj ^ > ^^Pm k^p where 
K attackers sense the channel busy. Otherwise, they will take the action (N/A, wait). 

• If J2iej\f P'i- ^ attackers will take the action (busy, wait) before the indirect 
punishment is triggered. After the indirect punishment is triggered, 

- Case.WC: all attackers will take the action (N/A, wait). 
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- Case.SC: all attackers will take the action (N/A, transmit) if P//^- > MP^j ^Cp where 
K attackers sense the channel busy. Otherwise, they will take the action (N/A, wait). 
We can show in Lemma [2] that Li?^^ < LR^^ in Case.WC and LR^^ < LR^^ in Case.SC 
when 5 is close to 1. Thus we can find an appropriate discount factor threshold to ensure that 
LR^^(j < LR^c for Case.WC and LR^^f < LR^^ for Case.SC. 

Theorem 5: For multiple attackers in Case. AT, there exists a threshold 5^^ E (0, 1) such that 
no attacks will happen if 5 > 5*^^. 



D. Proof of Lemma [7] 

In Case. NT, the attackers will only attack with the action (busy, transmit) when all SUs sense 
the channel idle when the indirect punishment is not triggered. But they may or may not transmit 
after the punishment is triggered. 

1) Case.WC (Weak Cooperation): In Case.WC, the attackers will not transmit even when all 
attackers sense the channel idle after the punishment is triggered. 

If the attackers behave as honest SUs, the indirect punishment will never be triggered. They 
share the spectrum opportunities with the honest SUs when all SUs sense the channel idle. The 
attackers' long-term expected aggregate reward is 

LKc = Y.^'Pr ( 5^ A = ) (P^o^ - Pn^oC^ M, 

which can be rewritten as in (|26|) . 

If the attackers attack with the action (busy, transmit) when all SUs sense the channel idle 
{J2i&Af — 0)' '^hen in time slot t = the indirect punishment will be triggered with the missed 
detection probability P§q. If no collision happens with the probability P^^q, the attack will not 
be detected and the attackers will attack again if all SUs sense the channel idle in the next time 
slot. By focusing on time slot t = 0, the attackers' long-term expected aggregate reward is 

= Pr ( 5^ A > ) 6LR^^c + Pr ( A = ) [P^o (l + 5LP^^) - P^^^iMCp)] . 

We can then recursively rewrite LR^^ as in (ITTI) . 

2) Case.SC (Strong Cooperation): In Case.SC, the attackers will still transmit when all 
attackers sense the channel idle after the punishment is triggered. 
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If the attackers behave as honest SUs, no indirect punishment will be triggered and they will 
receive the same long-term expected aggregate reward in (|26|) . 

If the attackers attack with the action (busy, transmit) when all SUs sense the channel idle in 
time slot t = 0, we can derive the attackers' long-term expected aggregate reward similar as the 
Case.WC, 



LR^^ = Pr[}_^D,>0]6LRic 



Pr [5^A = 0) 



\i€M J 

The only difference here is that the attackers can still obtain a positive expected aggregate reward 
after the punishment is triggered. Then we can recursively rewrite LRg^f as in (|28]) . 
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